Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DDP unused param error when TE is enabled in NeMo Lite #11364

Merged
merged 3 commits into from
Nov 24, 2024

Conversation

oyilmaz-nvidia
Copy link
Collaborator

What does this PR do ?

Fixed DDP issue with TE

@oyilmaz-nvidia oyilmaz-nvidia marked this pull request as ready for review November 22, 2024 17:45
if self.model_accelerator == "te":
from nemo.lightning.pytorch.accelerate.transformer_engine import te_accelerate

te_accelerate(self.model, fp8_autocast=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do you intend other settings like fp8 autocast to be specified?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I guess I should change that to a parameter class.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, change it to partial function as @akoumpa suggested.

@@ -75,17 +75,9 @@ def squad(tokenizer) -> pl.LightningDataModule:
grad_clip = None
use_dist_samp = False

model = llm.HfAutoModelForCausalLM(args.model)
model = llm.HfAutoModelForCausalLM(model_name=args.model, model_accelerator=args.model_accelerator)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we pass a partial function instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought about it but I am just not sure how it'll look from the user perspective.

Copy link
Contributor

beep boop 🤖: 🙏 The following files have warnings. In case you are familiar with these, please try helping us to improve the code base.


Your code was analyzed with PyLint. The following annotations have been identified:

************* Module hf
examples/llm/sft/hf.py:74:0: C0301: Line too long (162/119) (line-too-long)
examples/llm/sft/hf.py:26:0: C0115: Missing class docstring (missing-class-docstring)
examples/llm/sft/hf.py:39:0: C0116: Missing function or method docstring (missing-function-docstring)
examples/llm/sft/hf.py:23:0: W0611: Unused ModelCallback imported from nemo.lightning.pytorch.callbacks (unused-import)
************* Module nemo.collections.llm.gpt.model.hf_auto_model_for_causal_lm
nemo/collections/llm/gpt/model/hf_auto_model_for_causal_lm.py:26:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/llm/gpt/model/hf_auto_model_for_causal_lm.py:34:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/llm/gpt/model/hf_auto_model_for_causal_lm.py:58:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/llm/gpt/model/hf_auto_model_for_causal_lm.py:69:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/llm/gpt/model/hf_auto_model_for_causal_lm.py:72:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/llm/gpt/model/hf_auto_model_for_causal_lm.py:89:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/llm/gpt/model/hf_auto_model_for_causal_lm.py:101:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/llm/gpt/model/hf_auto_model_for_causal_lm.py:115:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/llm/gpt/model/hf_auto_model_for_causal_lm.py:126:4: C0116: Missing function or method docstring (missing-function-docstring)

-----------------------------------
Your code has been rated at 8.93/10

Thank you for improving NeMo's documentation!

Copy link
Contributor

[🤖]: Hi @oyilmaz-nvidia 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully

So it might be time to merge this PR or get some approvals

I'm just a bot so I'll leave it you what to do next.

//cc @pablo-garay @ko3n1g

@oyilmaz-nvidia oyilmaz-nvidia merged commit 3afcde0 into main Nov 24, 2024
169 of 170 checks passed
@oyilmaz-nvidia oyilmaz-nvidia deleted the onur/te-ddp-support branch November 24, 2024 22:27
ananthsub pushed a commit to ananthsub/NeMo that referenced this pull request Nov 25, 2024
)

* Fix DDP unused param error when TE is enabled

Signed-off-by: Onur Yilmaz <[email protected]>

* Added partial function for te

Signed-off-by: Onur Yilmaz <[email protected]>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <[email protected]>

---------

Signed-off-by: Onur Yilmaz <[email protected]>
Signed-off-by: oyilmaz-nvidia <[email protected]>
Co-authored-by: oyilmaz-nvidia <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants